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Abstract 

The most successful unfolding rules used nowadays in the partial evaluation of logic pro- 
grams are based on well quasi orders (wqo) applied over (covering) ancestors, i.e., a sub- 
sequence of the atoms selected during a derivation. Ancestor (sub) sequences are used to 
increase the specialization power of unfolding while still guaranteeing termination and 
also to reduce the number of atoms for which the wqo has to be checked. Unfortunately, 
maintaining the structure of the ancestor relation during unfolding introduces significant 
overhead. We propose an efficient, practical local unfolding rule based on the notion of 
covering ancestors which can be used in combination with a wqo and allows a stack-based 
implementation without losing any opportunities for specialization. Using our technique, 
certain non- leftmost unfoldings are allowed as long as local unfolding is performed, i.e., we 
cover depth-first strategies. To deal with practical programs, we propose assertion-based 
techniques which allow our approach to treat programs that include (Prolog) built-ins 
and external predicates in a very extensible manner, for the case of leftmost unfolding. 
Finally, we report on our implementation of these techniques embedded in a practical 
partial evaluator, which shows that our techniques, in addition to dealing with practi- 
cal programs, are also significantly more efficient in time and somewhat more efficient in 
memory than traditional tree-based implementations. To appear in Theory and Practice 
of Logic Programming (TPLP). 

KEYWORDS: Partial Evaluation, Partial Deduction, Logic Programming, Prolog, SLD 
semantics, Local Unfolding. 



1 Introduction 

The main purpose of partial evaluation (see (Jo nes et al. 1993)) for a general text 
on the area) is to specialize a given program w.r.t. part of its input data — hence 

* A preliminary version of this work appeared in the Post-proceedings of LOPSTR'04, LNCS 
3573, Springer- Verlag, 2005. 
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it is also known as program specialization. Essentially, partial evaluators are non- 
standard interpreters which evaluate expressions while enough information is avail- 
able and residualize them otherwise. The partial evaluation of logic programs is 
usually known as partial deduction (Lloyd and Shepherdson 1991 Gallagher 1993). 
Informally, the partial deduction algorithm proceeds as follows. Given an input pro- 
gram and a set of atoms, the first step consists in applying an unfolding rule to 
compute finite (possibly incomplete) SLD trees for these atoms. This step returns 
a set of resultants (or residual rules), i.e., a program, associated to the root-to-leaf 
derivations of these trees. Then, an abstraction operator is applied to properly add 
the atoms in the bodies of resultants to the set of atoms to be partially evaluated. 
The abstraction phase yields a new set of atoms, some of which may in turn need 
further evaluation and, thus, the process is iteratively repeated while new atoms 
are introduced. The number of such new atoms which can be introduced can in 
general be unbounded. The termination of the partial deduction process is ensured 
by two control issues. Following the terminology of ( |Gallagher 1993[ ), the so-called 
local control defines an unfolding rule which determines how to construct finite 
SLD trees. The global control defines an abstraction operator which guarantees 
that the number of new atoms is kept finite. Termination of the partial deduction 
algorithm involves ensuring termination both at the local and global levels. We 
refer to (Lcuschcl and Bruynooghc 2002) for a survey on both control issues. This 
article is centered on the local control, namely on the development of a practical, 
efficient unfolding rule. The techniques we will propose for local control can be used 
in combination with any global control strategy. 

We believe that two factors limiting the general uptake of partial deduction are: 
1) the relative inefficiency of the partial deduction method, and 2) the complications 
brought about by the treatment of real programs. Indeed, the integration of powerful 
strategies in the unfolding rule — like the use of wqos combined with the ancestor 
relation — can introduce a significant cost both in time and memory consumption of 
the specialization process. Regarding the treatment of real programs which include 
external predicates, non-declarative features, etc., the complications range from 
how to identify which predicates include these non-declarative features (ad-hoc but 
difficult to maintain tables are often used in practice for this purpose) to how to 
deal with such predicates during partial deduction. Also, the optimal treatment 
of these predicates during partial deduction often requires information which can 
only be available at partial deduction time if a global analysis of the program is 
performed. Our main objective in this work is to propose some novel solutions to 
these issues. 

State-of-the-art partial evaluators integrate terminating unfolding rules for local 
control based on wqos, like homeomorphic embedding (Kruskal I960: Lcuschcl and Bru ynooghc 2002[ ) 
which can obtain very powerful optimizations. Moreover, they allow performing the 
ordering comparisons over subsequences of the full sequence of the selected atoms. 
In particular, the use of ancestors for refining sequences of visited atoms, originally 
proposed in QBruynooghe et al. 1992[ ) , greatly improves the specialization power of 
unfolding while still guaranteeing termination and also reduces the length of the 
sequences for which the embedding order for the new atoms has to be checked. Un- 
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fortunately, having to maintain dependency information for the individual atoms 
in each derivation during the generation of SLD trees has turned out to introduce 
overheads which seem to cancel out the theoretical efficiency gains expected. In 
order to address this issue, in this article, we introduce ASLD resolution as the ba- 
sis for an efficient, stack-based implementation technique of a local unfolding rule 
relying on the notion of covering ancestors. Our technique can significantly reduce 
the overhead incurred by the use of covering ancestors without losing any oppor- 
tunities for specialization. We outline as well a generalization that allows certain 
non-leftmost unfoldings with the same assurances. 

In order to deal with real programs that include (Prolog) built-ins and ex- 
ternal predicates, we extend ASLD resolution and the ancestor-based local un- 
folding rule to handle these predicates by relying on assertion-based techniques 
(jPuebla et al. 2000]) . The use of assertions provides extensibility in the sense that 
users and developers of partial evaluators can deal with new external predicates 
during partial evaluation by just adding the proper assertions to these predicates 
— without having to maintain ad-hoc tables or modifying the partial evaluator it- 
self. We report on an implementation of our technique in a practical, state-of-the- 
art partial evaluator, embedded in a production compiler which uses assertions and 
global analysis extensively (the Ciao compiler (jBueno et al. 2004]) and, specifically, 
its preprocessor CiaoPP ( |Hermenegildo et al. 2005D ). We believe that our experi- 
mental results provide evidence that our technique pays off in practice and can 
thus contribute to the practicality of state-of-the-art partial evaluation techniques. 

An important observation is that the techniques that we propose in this article to 
control the unfolding process are useful in the context of online partial evaluation. 
Traditionally, two approaches to partial evaluation have been considered, online and 
offline partial evaluation (see (jLeuschel et al. 20041 |Leuschel and Bruyn ooghc 2002|). 
In online partial evaluation all control decisions are taken on the fly during the spe- 
cialization phase, by keeping track of the specialization history (e.g., the ancestor 
subsequences). In the offline approach, all control decisions are taken before the spe- 
cialization phase proper. These control decisions are based on abstract descriptions 
of the data instead of the actual data. The control strategy is usually represented 
as program annotations which are the sole decision criteria for control of the partial 
evaluator. For instance, regarding local control, an annotation can explicitly indi- 
cate that an atom should not be unfolded. Regarding global control, annotations 
typically specify for each call which arguments have to be generalised away (i.e., 
replaced by variables). Such annotations are generated automatically in some par- 
tial evaluators by a binding-time analysis ( |Craig et al. 200"4| ), while in other partial 
evaluators they are manually provided by the user, either in part or in full. The ad- 
vantages of the offline approach are that, once all control annotations are available, 
partial evaluation is quite simple and efficient. On the other hand, online partial 
evaluation while usually less efficient, it tends to have more powerful control strat- 
egy since control decisions are based on actual data instead of abstract descriptions 
of data. In principle, one could argue that both approaches are equally powerful 
(see ((Christcnscn and Gluck ~2004p ) and that the offline approach can be more ap- 
propriate if the output of a global program analysis is available, while online partial 
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evaluators usually only consider local, runtime information. In this work, we are 
interested in proposing novel techniques which help improve the efficiency of online 
partial evaluation. 

The structure of the article is as follows. Section [2] presents some required back- 
ground on local control during partial deduction. Section [3] shows by means of an 
example why using ancestors is needed. Section [4] presents ASLD resolution as the 
basis for an efficient unfolding rule based on ancestors which allows a stack-based 
implementation. Section [5] extends the unfolding techniques to the case of exter- 
nal predicates. Section [6] presents some experimental results which compare the 
performance of different unfolding strategies with several implementations. Finally, 
Section [7] discusses some related work and concludes. 

2 Background 

We assume some basic knowledge on the terminology of logic programming. See for 
example ( [Lloyd 1987D for details. 

Very briefly, an atom A is a syntactic construction of the form p(t\, . . . , t n ), 
where p/n, with n > 0, is a predicate symbol and t%, . . . , t n are terms. The function 
pred applied to atom A, i.e., pred(A), returns the predicate symbol p/n for A. A 
clause is of the form H <— B where its head H is an atom and its body B is a 
conjunction of atoms. A definite program is a finite set of clauses. A goal (or query) 
is a conjunction of atoms. 

We denote by {X\ >—>■ t\,...,X n <— ► t n } the substitution a with u(Xi) = ti 
for i = 1, . ..,n (with Xi ^ Xj if i ^ j), and a(X) = X for all other variables 
X. Given an atom A, 9(A) denotes the application of substitution 9 to A. Given 
two substitutions 9\ and 9i, we denote by 9\9i their composition. The identity 
substitution is denoted by id. 

A term t' is an instance of t if there is a substitution a with t' = cr(t). 

2. 1 Basics of partial deduction 

The concept of computation rule is used to select an atom within a goal for its 
evaluation. 

Definition 1 (computation rule) 

A computation rule is a function TZ from goals to atoms. Let G be a goal of the 
form <— Ai, . . . , An, ... ,Ak, k > 1. If 1Z(G) —Ar we say that Ar is the selected 
atom in G. 

The operational semantics of definite programs is based on derivations. 
Definition 2 (derivation step) 

Let G be <— A\, . . . , Ar, . . . , Ak- Let TZ be a computation rule and let 7Z(G) =Ar- 
Let C = H <— Bi, . . . , B m be a renamed apart clause in P. Then G' is derived from 
G and C via TZ if the following conditions hold: 

9 = mgu(Afi, H) 

G' is the goal <- 0(B X , . . .,B m ,A x , . . . , A R -i, A R+1 , ...,A k ) 
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The definition above differs from standard formulations (such as that in ( |Lloyd 1987[ )) 
in that the atoms newly introduced in G' are not placed in the same position where 
the selected atom Ar used to be, but rather they are placed to the left of any 
atom in G. For definite programs, this is correct since goals are conjunctions, which 
enjoy the commutative property. This modification will become instrumental to the 
operational semantics we propose in forthcoming sections. This is not true though 
for programs with extra logical predicates, as we will discuss in Section [5l Also, 
it is well-known that changing the atom's positions might not preserve finite fail- 
ure. Although our general notion of resolution allows reordering the atoms, in a 
practical system, we can allow only leftmost unfolding and still obtain significant 
improvements (as will be explained in Section 5). 

As customary, given a program P and a goal G, an SLD derivation for P U {G} 
consists of a possibly infinite sequence G = Go, Gi, G2, ■ ■ ■ of goals, a sequence 
Ci, C2, . . . of properly renamed apart clauses of P, and a sequence of computed an- 
swer substitutions 61,82, ■ ■■ (or mgus) such that each Gi+i is derived from Gi 
and Cj+i using If Gi is of the form <— A±, . . . , Ar, . . . , A^ and G;+i = 

0(Bi, . . . , B m ,Ai, . . . , Ar-i, Ar+i, . . . , Ak) is derived from Gi (as stated in Def- 
inition HJ, we say that each atom with i — 1, . . . , R — 1, R + 1, . . . , k is the 
instance originating from Ai . Finally, we say that the SLD derivation is composed 
of the subsequent goals Go, G±, G2, — 

A derivation step can be non-deterministic when Ar unifies with several clauses in 
P, giving rise to several possible SLD derivations for a given goal. Such SLD deriva- 
tions can be organized in SLD trees. A finite derivation G = Go, G\, G%, ■ ■ ■ , G n is 
called successful if G n is empty. In that case 9 = 6162 ■ • ■ n is called the computed 
answer for goal G. Such a derivation is called failed if it is not possible to perform 
a derivation step with G n . 



In order to compute a partial deduction (Lloyd and Shepherdson 1991 ), given an 
input program and a set of atoms, the first step consists in applying an unfolding 
rule to compute finite (possibly incomplete) SLD trees for these atoms. Then, a set 
of resultants or residual rules are systematically extracted from the SLD treesfj] 

Definition 3 {unfolding rule) 

Given an atom A, an unfolding rule computes a set of finite SLD derivations 
£>!,..., D n (i.e., a possibly incomplete SLD tree) of the form Di — A, . . . ,Gi with 
(a composed) computed answer substitution 9i for i = 1, . . . , n whose associated 
resultants are 6i(A) <— G;. 

A partial evaluation for the initial atom is then defined as the set of resultants, 
i.e., a program, associated to the root-to- leaf derivations for the computed SLD 
tree. The partial evaluation for a set of atoms is defined as the union of the partial 



evaluations for each atom in the set. We refer to (Leuschel and Bruynooghe 2002) 
for details. 



1 Let us note that the definition of a partial deduction algorithm requires, in addition to an 
unfolding rule, the so-called global control level (see Section [TJ. 
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2.2 Termination of local control 

In order to ensure the local termination of the partial deduction algorithm while pro- 
ducing useful specializations, the unfolding rule must incorporate some non-trivial 
mechanism to stop the construction of SLD trees. Nowadays, well-quasi orderings 
(wqo) (|S0rensen and Gl iick 1995; Le uschel 1998ft are broadly used in the context 
of on-line partial evaluation techniques. 

It is well known that the use of wqos allows the definition of admissible sequences 
which are always finite. Intuitively, a sequence of elements s\, S2, ■ ■ ■ in S is called 
admissible with respect to an order <$ (Bruynooghc ct a l. 1992[ ) iff there are no 
i < j such that Si <s Sj. The next definition captures this idea. 

Definition 4 (admissible -wqo) 

Let (Ai, . . . , A n ) be a sequence of atoms and A be a new atom to be added to the 
sequence. Let <s be a wqo. We denote by Admissible(A, (A\, . . . ,A n ), <s), with 
n > the truth value of the expression VAj, i £ {1, . . . , n} : A ^5 A^. 

Given a derivation G\, G2, • ■ • , G B +i in order to decide whether to evaluate G n +i 
or not, we check that the selected atom in G n +i is not strictly greater or equal to 
any previous comparable selected atom ([Leuschel 2002b[) . Observe that the ances- 
tor test is only applied on comparable atoms, i.e., ancestor atoms with the same 
predicate symbol. This corresponds to the original notion of covering ancestors 
(Bruynooghc ct al. 1992). Note that A\, . . . , A n in the above definition refer to the 
selected atoms in G\, . . . , G n and A refers to the selected atom in G n +i- 

Among the wqo, the homeomorphic embedding ordering (Krus kal 1960P has proved 
to be very powerful in practice. We recall the definition of homeomorphic embed- 
ding, which can be found for instance in Leuschel's work (jLeusc hcl 1998). 

Definition 5 (<) 

Given two atoms A = p(t\, . . . ,t n ) and B = p(s\, . . . , s n ), we say that B embeds 
A, written A < B, if ti < s$ for all i s.t. 1 < i < n. The embedding relation over 
terms, also written <, is defined by the following rules: 

1. Y< X for all variables X,Y. 

2- s < f(t%, . . . , t n ) if s < ti for some i. 

3. f(si, . . . , s n ) < f(t\, ■ • • , tn) if Si < U for alU, 1 < i < n. 

Informally, atom t\ embeds atom ti if t<i can be obtained from t\ by deleting 
some operators, e.g., f (g(A, B), h(C, s(D)) embeds f(A,h(C,D)). 

2.3 Covering ancestors 

State-of-the-art unfolding rules allow performing ordering comparisons over sub- 
sequences of the full sequence of the selected atoms of a derivation by organizing 
atoms in a proof tree ( |Bruynooghe 199lj ), achieving further specialization in many 
cases while still guaranteeing termination. To do so, they maintain dependencies 
over the selected atoms which are chosen in such a way that only a subsequence 
of such selected atoms needs to be considered. The essence of the most advanced 
techniques is based on the notion of covering ancestors ( |Bruynooghe et al. 1992[ ). 
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part it ion ([] ,_,□,[]). 
qsort([] R R) part it ion ( [E I R] ,C, [ElLeftl] , Right) 



qsort(Ll,R, [X|R1]) . 

Fig. 1. A quick-sort program 



E =< C, 



qsort([X|L] ,R,R2) :- 

partition(L,X,Ll,L2), partition(R, C.Leftl .Right) . 

qsort (L2 , Rl , R2) , partition ( [E I R] , C , Lef t , [E I Right 1] ) 



E > C, 

partition(R,C,Left,Rightl) . 



Definition 6 {ancestor relation) 

Given a derivation step and An, Bi, i = 1, . . . , m as in Definition [51 we say that An 
is the parent of the instance of Bi, i = 1, . . . , m, in the goal and in each subsequent 
goal where the instance originating from Bi appears. The ancestor relation is the 
transitive closure of the parent relation. 

The important observation is that a derivation can contain selected subgoals which 
are indeed part of a different branch in the proof tree. 

Given an atom A and a derivation D, we denote by Ancestor ~s(A, D) the sequence 
of (comparable) ancestors of A in D as defined in Definition [6] It captures the 
dependency relation implicit within a proof tree. 

It has been proved ( |Bruynooghe et al. 1992 ) that any infinite derivation must 



have at least one inadmissible covering ancestor sequence, i.e., a subsequence of the 
atoms selected during a derivation. Therefore, it is sufficient to check the selected 
ordering relation <5 over the covering ancestor subsequences in order to detect 
inadmissible derivations. 

Definition 7 {safe step) 

An SLD step is safe with respect to a wqo if the covering ancestor sequence of the 
selected atom is admissible with respect to that order. 

The above definition is extended to derivations as follows. 

Definition 8 {safe derivation) 

An SLD derivation is safe with respect to a wqo if all covering ancestor sequences 
of the selected atoms are admissible with respect to that order. 

Otherwise, the SLD derivation is considered unsafe. 



3 The Usefulness of Ancestors 

We now illustrate some of the ideas discussed so far and, specially, the relevance 
of ancestor tracking, through an example. Our running example is the program 
in Figure [TJ which implements the well known quick-sort algorithm, "qsort", us- 
ing difference lists. Given an initial atom of the form qsort (List , Result ,Cont) , 
where List is a list of numbers, the algorithm returns in Result a sorted difference 
list which is a permutation of List and such that its continuation is Cont. For exam- 
ple, for the query qsort ([1,1,1] , L , []), the program should compute L= [1 , 1 , 1] , 
constructing a finite SLD tree. Notice that, in general, if the input arguments to a 
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l. qs([l,l,l],R,Q)<> 
Y 

2. p([l,l],l,Ll,L2)< 1 > ,3.qs(L2,Rl, []) W , 4.qs(Ll, R, [l|Rl]) {1} 

|{L1«[1|L]} 

5. 1 =< l ^ a >,6.p([l], 1, L, L2) {1 ' 2 >, 3.qs(L2, Rl, 4.qs([l|L], R, [l|Rl]) {1} 

Y 

I if 1.21 

6. 



p([l] ,1,L,L2) 



{1 ' 2> , 3.qs(L2, Rl, 4.qs([l|L], R, [l|Rl]) {1 > 



|{ln-[i|L']} 

7. 1 =< l < 1A, >,8.p(0, 1, L', L2)<W>, 3.qs(L2, Rl, , 4.qs([l, l|L'], R, [l|Rl])^> 

Y 

8. p([],l,L',L2)<^ 6 > , 3.qs(L2, Rl, , 4.qs([l, l|L'], R, [l|Rl]) {1} 

|{L'~[],L2~[]} 

3. gB(D,Rl,D)< 1 > ,4.qs([l,l],R,[l|Rl])W 

4. qs([l,l],R, [1]) {1} 
Y 

{1 ' 4} , 10.qs(L2', Rl', [1]) {1 ' 4} , ll.qs(Ll', R, [l|Rl']) {1 ' 4} 



p([l] ,1,L1' ,L2') 



Fig. 2. Derivation with Ancestor Annotations 

program are not sufficiently instantiated, the corresponding SLD tree can be infinite 
and/or contain incomplete derivations. 

Consider now Figure [2| which presents an incomplete SLD derivation for our 
quick-sort program and the query qsort( [1,1,1] ,R, [] ) using a leftmost unfold- 
ing rule. For conciseness, predicates qsort and partition are abbreviated as qs 
and p, respectively in the figure. Note that each atom is labeled with a number (an 
identifier) for future referenc^l and a superscript which contains the list of ances- 
tors of that atom. Let us assume that we use the homeomorphic embedding order 
(|Leuschel 1998) as wqo. If we check admissibility w.r.t. the full sequence of atoms, 
i.e., we do not use the ancestor relation, the derivation will stop when atom number 
9, i.e., p([l], 1, L', L2'), is found for the second time. The reason is that this atom is 
greater or equal to the atom number 6 which was selected in the third step, indeed, 
they are equal modulo renamingd 

This unfolding rule is too conservative, since the process can proceed further 
without risking termination (in fact, the SLD tree for a leftmost computation rule 
for the example query is finite and thus the query can safely be fully unfolded). 
The crucial point is that the execution of atom number 9 does not depend on atom 



2 By abuse of notation, we keep the same number for each atom throughout the derivation 
although it may be further instantiated (and thus modified) in subsequent steps. This will 
become useful for continuing the example later. 

3 Let us note that the two calls to the builtin predicate =< which appear in the derivation can be 
executed since the arguments are properly instantiated. However, they have not been considered 
in the admissibility test since these calls do not endanger the termination of the derivation, as 
we will discuss in Section [5] 
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Fig. 3. Proof tree for the example 

number 6 (and, actually, the unfolding of 6 has been already completed when atom 
number 9 is being considered for unfolding). In order to illustrate this, consider 
Figure [3] which shows the proof tree associated to this derivation. Nodes are labeled 
with the numbers assigned to each atom, instead of the atoms themselves. Note that, 
in order to decide whether or not to evaluate atom number 9, it is only necessary 
to check that it is not greater or equal to atoms 4 and 1, i.e., than those which are 
its ancestors in the proof tree. On the other hand, and as we saw before, if the full 
derivation is considered instead, as in Figure [2j atom 9 will be compared also with 
atom 6 concluding imprecisely that the derivation may not be safe. 

Despite their obvious relevance, unfortunately the practical applicability of un- 
folding rules based on the notion of covering ancestor is threatened by the overhead 
introduced by the implementation of this notion. A naive implementation of the 
notion of ancestor keeps — for each atom — the list of its ancestors, as it is de- 
picted in Figure [2] by using superscripts. This implementation is relatively efficient 
in time but presents a high overhead in memory consumption. Our experiments 
show that the partial evaluator can run out of memory even for simple examples. 
A more reasonable implementation maintains the proof tree as a global structure. 
In a symbolic language, this greatly reduces memory consumption but the cost of 
traversing the tree for retrieving the ancestors of each atom introduces a signifi- 
cant slowdown in the partial evaluation process. We argue that our implementation 
technique is efficient in time and space, overcoming the above limitations. 



In this section, we first define the notion of local computation rule. We then intro- 
duce ASLD resolution, a modification of SLD which incorporates ancestor stacks 
and which is the basis of our efficient implementation. Interestingly, we then impose 
the local condition to the computation rule in order to ensure accurate results for 
ASLD resolution. 



4 An Efficient Implementation for Local Unfolding 



4-1 A local computation rule 

Our definition of local unfolding is based on the notion of ancestor depth. 
Definition 9 (ancestor depth) 
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Given an SLD derivation D — Go, ... , G m with G m =<— A\,. .., Ak, k > 1, the 
ancestor depth of Ai for i = 1, . . . , k, denoted depth(Ai, D) is the cardinality of the 
ancestor relation for Ai in D. 

Intuitively, the ancestor depth of an atom in a goal is the depth at which this atom 
is located in the proof tree associated to the derivation. 

Definition 10 {local computation rule) 

A computation rule 1Z is local if VD = Go, . . . , G n such that Gi =«— An, . . . , A inii 
for i = 0, .., n, it holds that depth(1Z(Gi) , D) > depth(Aij, D) Vj = 1, . . . , rrij. 

Intuitively, a computation rule is local if it always selects one of the atoms which 
is deepest in the proof tree for the derivation. As a result, local computation rules 
traverse proof trees in a depth-first fashion, though not necessarily left to right nor in 
any other fixed order. Thus, in principle, in order to implement a local computation 
rule we need to record (part of) the derivation history (i.e., its proof tree). Note 
that the computation rule used in most implementations of logic programming 
languages, such as Prolog, always selects the leftmost atom. This computation rule, 
often referred to as leftmost computation rule, is clearly a local computation rule. 
Selecting the leftmost atom in all goals guarantees that the selected atom is of 
maximal depth within the proof tree as it is traversed in a depth-first fashion — 
without the need of storing any history about the derivation. 

It is interesting to note that we can allow more flexible computation rules which 
are not necessarily local while still ensuring termination at the cost of no accuracy 
assurance. A more detailed discussion on this will appear at the end of Section EQ1 

An instrumental observation in our approach is that the proof trees which are 
used in order to capture the ancestor relation can be seen as (a simplified version 
of) the activation trees (jAho et al. 1986]) used in compiler theory for representing 
program executions, by simply regarding selected atoms as procedure calls. The 
nodes in such activation trees are activation records, which contain information 
about local variables, the current program counter, the return address, etc. of the 
corresponding call. Nested subprogram calls result in children activation records. In 
the vast majority of programming languages, execution of a program corresponds to 
traversing activation trees in a depth-first fashion. Therefore, for efficiency, rather 
than maintaining the whole activation tree in memory, run-time systems for execu- 
tion of such programming languages feature a call stack where activation records 
are stored. This call stack contains exactly the sequence of activation records which 
are active at any point in time during the execution. This implementation strategy 
requires that new activation records be added to the call stack as soon as a new 
subprogram is called and that the top of the call stack is popped when the execution 
of a subprogram returns. 

Our idea then is to maintain during unfolding an ancestor stack, whose elements 
are the ancestors of a goal, instead of a full proof tree. The advantages of this are 
clear: since the ancestor stack corresponds to a single branch in the proof tree from 
the current selected atom to all its ancestors in the proof tree, maintaining it should 
offer significant performance improvements both in terms of memory and time 
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efficiency. As in the case of control stacks, in order to compute ancestor stacks we 
need to determine exactly when each ancestor should be pushed to and popped from 
the ancestors stack. The first part is relatively simple: any resolution step requires 
pushing its associated selected atom. The second part, i.e., popping elements from 
the stack, is more complicated since we need to know when the computation of 
the associated call (or subprogram) is finished. In logic programming terminology 
this corresponds to determining the (partial) success states for all atoms in the 
derivation. In principle, success states for individual atoms are not observable in 
SLD resolution, except for the top-level query. As a result, and as we discuss below, 
some changes in the operational semantics will be needed in order to make this 
information explicit. 

Another important observation which we exploit in this paper is that the idea of 
using a stack for storing the active part of a tree does not need to be restricted to 
leftmost computation and it works equally well as long as the computation rule is 
local. Indeed, sibling atoms, i.e., with the same ancestor depth, can be selected in 
any order and the idea of using an ancestor stack still applies. 

4-2 ASLD Resolution: SLD resolution with ancestor stacks 

We now propose an easy-to-implement modification to SLD resolution as presented 
in Section [2] in which success states for all internal calls are observable — and where 
the control word is available at each state. We will refer to this resolution as SLD 
resolution with ancestor stacks, or ASLD for short. The proposed modification 
involves 1) augmenting goals with an ancestor stack, which at each stage of the 
computation contains the control word of the derivation, which corresponds to 
the ancestors of the next atom which will be selected for resolution, and 2) adding 
pseudo-atoms to the goals used during resolution which mark a scope (i.e., it sepa- 
rates groups of atoms which are at different depth in the proof tree). In particular, 
we use the pseudo-atom f (read as "pop" ) to indicate the end of a depth scope, 
i.e., after it we move up in the proof tree. It is guaranteed not to clash with any 
existing predicate name. And its purpose is twofold: 2.1) when a mark is leftmost 
in a goal, it indicates that the current state corresponds to the success state for 
the call which is now on top of the ancestor stack, i.e., the call is completed, and 
the atom on top of the ancestor stack should be popped; 2.2) the atoms within the 
scope of the leftmost mark have maximal ancestor depth and thus a local unfolding 
strategy can be easily defined in the presence of these pseudo-atoms. 

The following two definitions present the derivation rules in our ASLD semantics. 
Now, a state S is a tuple of the form (G I AS) where G is a goal and AS is an 
ancestor stack (or stack for short). The stack will keep track of the ancestor atoms 
that the new selected atoms need to be compared to (by means of the wqo being 
used). Thus the stack will be instrumental in being able to stop a derivation as 
soon as termination of the process can no longer be guaranteed by the wqo being 
used. To handle such stacks, we will use the usual stack operations: empty, which 
returns an empty stack, pushes', Item), which pushes Item onto the stack AS, and 
pop(AS), which pops an element from AS. In addition, we will use the operation 
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contents(y45'), which returns the sequence of atoms contained in AS in the order in 
which they would be popped from the stack AS and leaves ^45 unmodified. 

Definition 11 [derive) 

Let G = <— Ax,..., A R , . . . , A k be a goal with Ax ^ \ . Let S = (G\ AS) be a 
state and AS be a stack. Let <$ be a wqo. Let 1Z be a computation rule and let 
71(G) —A R with A R ^ | . Let C = H <— Bx, ■ ■ ■ , B m be a renamed apart clause. 
Then S' = (G 1 I AS') is derived from S and C via 7\L if the following conditions 
hold: 

Admissible(A R , contents(AS) , <s) 
9 = mgu(A R , H) 

G' is the goal <- 6(B 1 , . . . , B m , j , Ax, . . . , A R -x, A R+1 , . . . , A k ) 

AS' = push(AS,A R ) 

The derive rule behaves as the one in Definition [2] but in addition: i) the mark f 
"pop" is added to the goal, and ii) a copy of A R is pushed onto the ancestor stack. 
As before, the derive rule is non-deterministic if several clauses in P unify with 
the atom A R . However, in contrast to Definition [21 this rule can only be applied 
to an atom different from f if 1) the leftmost atom in the goal is not a f mark, 
and 2) the current selected atom A R together with its ancestors do constitute an 
admissible sequence. If 1) holds but 2) does not, this derivation is stopped and we 
refer to such a derivation as inadmissible or unsafe (see Definition [5]) . 

Definition 12 (pop-derive) 

Let G = <— Ax, . . . , Ak be a goal with Ax = t • Let S = (G I AS) be a state and 
AS be a stack. Then S' = (G' I AS') with G' =<— A 2 , A k and AS' = pop(AS) 
is pop- derived from S. 

The pop-derive rule is used when the leftmost atom in the resolvent is a | 
mark. Its effect is to eliminate from the ancestor stack the topmost atom, which 
is guaranteed not to belong to the ancestors of any selected atom in any possible 
continuation of this derivation. 

Note that derive steps w.r.t. a clause which is a fact are always followed by a 
pop-derive and thus they can be optimized by not pushing the selected atom A R 
onto the stack and not including a f mark into the goal which would immediately 
pop A R from the stack. They have been also optimized in the implementation 
described in Section[6l Next, we present the following rule derive-fact with such an 
optimization, although we do not use it for our formal developments in Section f4.3l 
Indeed, its inclusion in the semantics would require that rule derive is only applied 
if m > 0. 

Definition 13 (derive-fact) 

Let G = <— Ax,..., A R , . . . , A k be a goal with Ax ^ \ . Let S — {G\ AS) be a 
state and ^45* be a stack. Let <s be a wqo. Let 71 be a computation rule and let 
71(G) =A R with A R ^ | . Let C = H. be a renamed apart fact. Then S' = (G 1 I AS) 
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is derived from S and C via 1Z if the following conditions hold: 

Admissible(An, contents(^45'), <s) 
9 = mgu(A R , H) 
G' is the goal <- 6{A U . . . , A R+1 , . . . , AO 

Computation for a query G starts from the state Sq = (G \ empty). Given a non- 
empty derivation D, we denote by curr-goal(D) and curr-ancestors(D) the goal and 
the stack in the last state in D, respectively. At each step of a derivation D at 
most one rule, either derive, derive-fact or pop-derive, can be applied. 

Example 1 

Figure 2] illustrates the ASLD derivation corresponding to the derivation with ex- 
plicit ancestor annotations of Figure [5] Sometimes, rather than writing the atoms 
themselves, we use the same numbers assigned to the corresponding atoms in Fig- 
ure [2j By abuse of notation, we again always use the same number assigned to 
an atom although further instantiation is performed. The stack contains the list 
of atoms exactly in the instantiation state they have when they are pushed in the 
stack. Each step has been appropriately labeled with the applied derivation rule. 
Although rule external- derive has not been presented yet, we can just assume that 
the code for the external predicate =< is available and has the expected behavior. 

It should be noted that, in the last state, the stack contains exactly the ancestors 
of partition([l] ,1,L1' ,L2'), i.e., the atoms 4 and 1, since the previous calls 
to partition have already finished and thus their corresponding atoms have been 
popped off the stack. Thus, the admissibility test for partition ( [1] , 1 ,L1 ' ,L2' ) 
succeeds, and unfolding can proceed further without risking termination. Indeed, 
the derivation can be totally unfolded, which results in the following (optimal) 
partial evaluation in which all input data have been satisfactorily consumed 

qsort([l,l,l],[l,l,l],[]). 

Finally, since the goals obtained by ASLD resolution may contain atoms of the form 
t , resultants are cleaned up before being transferred to the global control level or 
during the code generation phase by simply eliminating all atoms of the form ] . 

It is easy to see that for each ASLD derivation D$ there is a corresponding 
SLD derivation D with the same computed answer and the same goal without 
the | atoms. Such SLD derivation is the one obtained by performing the same 
derive steps (with exactly the same clauses) using the same computation rule and 
by ignoring the pop- derive steps since goals in SLD resolution do not contain 
atoms. We use simplify (Ds) — D to denote that D is the SLD derivation which 
corresponds to Ds- 

4-3 Accuracy results 

We would now like to impose a condition on the computation rule which allows 
ensuring that the contents of the stack are precisely the ancestors of the atom to be 
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({ qs([l,l,l],R, []) }![]) 

^ derive 

({2,3,4, | } I [qsortQl, 1, l],R, [])]) 

^ derive 

({5,6, T,3,4, T }l[part([l,l],l,Ll,L2),qs([l,l,l],R,[])]> 

^ external — derive 

({6, T,3,4, T } I [part([l, 1], 1, LI, L2), qs([l, 1, 1], R, []))] 

^ derive 

({7,8, T, T,3,4, T }l[part([l],l,L,L2),part([l,l],l,Ll,L2),qs([l,l,l],R,[])]) 

^ external — derive 

({8, T, T,3,4, T }l [part([l],l,L,L2),part([l,l],l,Ll,L2),qs([l,l,l],R,[])]) 

^ derive — f act 

({ T , T ,3,4, T }l [part([l],l,L,L2),part([l,l],l,Ll,L2),qs([l,l,l],R,[])]) 

^pop — derive 

({ T ,3,4, T } I [part([l, 1], 1, LI, L2), qs([l, 1, 1], R, [])]) 

^pop — derive 

({3,4, | } I [qsort([l, 1, l],R, [])]) 

^ derive— fact 

({4, T}l[qsort([l,l,l],R,[])]> 

^ derive 

({part([l],l,Ll',L2'),10,ll, T , T } I [qsort([l, 1], R, [1]), qsort([l, 1, 1], R, [])]) 



Fig. 4. ASLD Derivation for the example 



selected. The following notion of depth-preserving computation rule allows precisely 
this. 

Definition 14 (depth-preserving) 

A computation rule 1Z is depth-preserving if for each non-empty goal G = <— 
A u ...,A k with Ai + T , K(G) = A R and J t {A 2 , 

Intuitively, a depth-preserving computation rule always returns an atom which is 
strictly to the left of the first (leftmost) j mark. Note that | is used to separate 
groups of atoms which are at different depth in the proof tree. Thus, the notion 
of depth-preserving computation rules in ASLD resolution is equivalent to that of 
local computation rules in SLD resolution. 

Proposition 1 (ancestor stack) 

Let Ds be an ASLD derivation for the initial query G in program P via a depth- 
preserving computation rule. Let D be an SLD derivation such that simplify (D s) — 
D. If, curr.goal(Ds) = A\,...,A n , | and curr -ancestor s(Ds) = AS, we 
distinguish two cases: 

• if Ai 7^ t , then contents(,45) = Ancestor s(Ai, D) for Ai ^ | for i = 1, . . . , n, 

• if Ai = t , then the atom on the top of AShas no descendents in curr_goal(Ds) 
and contents(pop(A5)) = Ancestor s(Ai, D) for Ai ^ | for i = 2, . . . , n. 
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Proof 

The proof is by induction on the length fc of the ASLD derivation, D$, of the form 
So, ■ ■ ■ , Sk where Si, for i = 0, . . . , A;, is the sequence of states corresponding to each 
derivation step from the initial state Sq = (G\ empty). To simplify the proof, we 
do not make explicit distinction between rules derive and derive-fact. 

base case (fc = 1). Consider the initial state So = (G I empty) where the goal 
G is of the form <— Ax, . . . , An, . . . ,A n , n > 1. Initially, all atoms in G are 
different from j , i.e., Ai ^ f for i = l,...,n. Therefore, we can only apply 
rule derive to Sq. Let us assume that TZ is a depth-preserving computation rule 
and 7Z(G) =Ar. Let C = H <— . . . , £? m be a renamed apart clause with 9 = 
mgu(An, H). The test Admissible(An, contents(empty), <$) holds (otherwise the 
derivation step is not possible). Then, the state S\ = (G" I AS') is derived from 
S and C where G' = 6{B X , B m , | , A u . . . , A R -x,A R +x, A n ) and AS' = 
push(empty, Ar). 

Now, we want to prove that contents(push(empty, An)) — Ancestors(Bi, D), 
i = 1, . . . , m, for the equivalent SLD derivation D. Hence, we perform the cor- 
responding SLD step from <— A\, . . . , An, ■ ■ ■ , A m using the same computation 
rule TZ and the same clause C. In D, we derive the goal: 

9(Bi, . . . , B m , Ax,..., An-x,An+i, • • • , A^) 

By definition of ancestor (Def.[6|), An is the only ancestor of Bi in D, i = 1, . . . , m. 
Consequently, contents(push(empty, An)) = Ancestors(Bi, D) holds and our claim 
follows. 

inductive case (k > 1). We decompose the ASLD derivation D$ of length k in 
two parts. The first part, D$-i, is the derivation from Sq to Sk-i of length fc — 1. 
The second part corresponds to the last ASLD derivation step from Sk-i to Sk- 
Let Sfc_i = (G k -i I AS k -i) with G k -i = Ax,..., A n , f , . . . and A t ^ | for 
i = 1, . . . , n. We now distinguish two cases depending on the value of n: 

(n > 0): We first apply the inductive hypothesis to the ASLD derivation, D$-x, 
of length fc — 1 of the form So, ... , Sk-i- Consider that D' is the equiva- 
lent SLD derivation obtained by simplify (Ds—x) = D' . Now, we perform 
the last ASLD derivation step from Sk-x- Since Ax ^ f , we can only 
apply rule derive to Sk-i- By assumption, TZ is a depth-preserving com- 
putation rule. Thus, it will select an atom An from A\ to A n . In par- 
ticular, assume that 7Z(Gk-x) =An- Let C = H <— Bx, ■ ■ ■ ,B m be a 
renamed apart clause with 6 — mgu(An, H). We assume that the test 
Admissible(An,contents(ASk-x), <s) holds, otherwise the step is not pos- 
sible. Then, Sk = (Gk I ^45*^) is derived from Sk-x and C where 

Gfc = 6(Bx, ■ ■ ■ , B m , t , Ax, ■ ■ ■ , An-x, An+x, ■ ■ ■ , A n , t , • • •) 
AS k = push(^5*-i,AK) 

Now, we want to prove that contents^S^) = Ancestor s(Bi,D), for i — 
1, . . . ,m, for the equivalent SLD derivation D. Hence, we perform the cor- 
responding SLD step from the last goal, named Q, in D' . We know that 
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Q is of the form Q = A\ 1 . . . ,A n , . . . since simplify (Ds—i) = D' and all 
Ai y£ | . By using the same local computation rule for SLD resolution, 
the selected atom is also Ar. With the same clause C, we derive the goal 
6(Bi, B rn , Ai,..., Ar-i,Ar+i, . . . , An, . . .)■ Now, by applying Defini- 
tion [6]), the ancestors of Bi are Ar plus the ancestors of Ar in D', for 
i = 1, . . . , m. 

Finally, we proceed to put together the conclusions obtained from the two 
derivations. On one hand, we have that contents^S^-i) = Ancestors(Ai, D'), 
i = 1, . . . , n. In particular, we have that contents(AS'fe_i) = Ancestor s(Ar, D') 
for i = R. Thus, we have that: 

AS k = push(AS k -i,A R ) 

contents(.AS fc ) = [Ar\ AS k ] = Ancestors(Bi, D) 

which proves our claim. 

(n = 0): In this case, the goal is of the form Gk-i = T > Ci> C2, By the inductive 

hypothesis, we know that the atom on the top of ASk has no descendents in 
curr-goal(Ds-i) and contents(pop(y!5fc_i)) = Ancestor s{Ci, D 1 ) for Cj ^ 
j for i = 1, . . . , n. Now, the only possibility is that Sk = (G k I ASk) is pop- 
derived from Sk-i with Gk — C±, C2 ■ ■ ■ and ASk = pop(A5fc_i). Therefore, 
we have that contents(^4S'fc) = contents(pop(AS'/ £ )) = Ancestors(Ci,D'). 
Finally, in the equivalent SLD derivation step D from D , no step is per- 
formed as simplify removes the corresponding atom (i.e., the f mark). 
Hence, Ancestor s(C D) — Ancestor s(Ci, D') and the result holds. 

□ 

The above result trivially holds for leftmost unfolding which is always depth- 
preserving. The next theorem guarantees that we do not lose any specialization 
opportunities by using our stack-based implementation for ancestors instead of 
the more complex tree-based implementation, i.e., our proposed semantics will not 
stop "too early" . It is a consequence of the above proposition and the results in 
( |Bruynooghe et al. 1992[ ) about wqo. 

Theorem 1 {accuracy) 

Let D be an SLD derivation for query G in a program P via a local computation 
rule. Let <s be a wqo. If the derivation D is safe w.r.t. <s then there exists an 
ASLD derivation Ds for G and P via a depth-preserving computation rule such 
that simplify [Ds) = D. 

Proof 

The proof is by contradiction. We consider the safe SLD derivation D of length k 
for G via a local computation rule 1Z. Trivially, the partial derivation D' of length 
k — 1 from G to a goal G' is safe. 

Now, the assumption is that, Ds, the ASLD derivation for S = (G I empty) cor- 
responding to D is not safe. In particular, we consider the partial ASLD derivation, 
D' s , from the state S to the state 5", such that simplify (D' s ) = D' and, from which a 
further ASLD derivation step for S' is not safe, i.e., it would result in an inadmissible 
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derivation. The state S' is of the form S' = (G' I AS') with G' = A x , . . . , A n , | , • • • 
and Ai ^ ] , for i = 1, . . . , n. By Definition 1 141 the depth-preserving computation 
rule can only select an atom Ai, for i — 1, . . . , n. 

Since a safe derivation step from S' cannot be performed, the truth value of the 
expression: 

Admissible(Ai, contents(AS'), <s) 

is false for any selected atom Ai, i = 1, . . . , n. By Definition [4] this means that 
VAi, 3B G contents(AS') : B <$ Ai. By applying Proposition [TJ we have that the 
truth value of Admissible(Ai, Ancestor s^Ai, D'), <$) is false as well. Therefore, 
VAj, 3B G Ancestor s{A il D l ) : B < s Ai. 

Finally, since simplify (D' s ) = D' and all atoms A4 ^ f i G' is a goal of the form 
A±, . . . , A„, . . . The equivalent computation rule, 1Z, can select the same atoms Ai. 
However, Admissible(Ai, Ancestor s(^Ai, D'), <g) is false for all Ai, for i = 1, . . . , n. 
Thus, the last derivation step in D is inadmissible, hence, we have a contradiction. 

□ 

Note that since our semantics disables performing any further steps as soon as inad- 
missible sequences are detected, not all local SLD derivations have a corresponding 
ASLD derivation. However, if a local SLD derivation is safe, then its corresponding 
ASLD derivation can be found. 

It is interesting to note that we can allow more flexible computation rules which 
are not necessarily depth-preserving while still ensuring termination. For instance, 
consider a state (A\, . . . , A n , f , An, ... IP) with f ^ {At, . . . , A n } and a non 
depth-preserving computation rule which selects the atom Ar to the right of the 
| mark. Then, rule derive will check admissibility of An w.r.t. all atoms in the stack 
P. However, the topmost atom of P, say Pi, is an ancestor only of the atoms Ai to 
the left of An but it is not an ancestor of Ar. The more f marks the computation 
rule jumps over to select an atom, the more atoms which do not belong to the 
ancestors of the selected atom that will be in the stack, thus, the more accuracy 
and efficiency we lose. In any case, the stack will always be an over-approximation 
of the actual set of ancestors of Ar. 

Our local unfolding rule based on ancestor stacks can be used within any partial 
deduction framework, including Conjunctive Partial Deduction (CPD) ( |De Schreye et al. 1999 ). 
In principle, its use within the CPD framework does not pose any particular diffi- 
culty and our unfolding rule can simply be incorporated as any other strategy within 
the method. Indeed, the main distinction of CPD w.r.t. non conjunctive methods 
is on the use of an enhanced global control which generates a set of conjunctions 
rather than individual atoms, while any of the existing local control strategies can 
be used in combination with such a global control. The only requirement is that 
the unfolding rule takes as input a conjunction of atoms rather than a single atom, 
which is always a trivial extension. It should be noted that some CPD examples 
may require the use of an unfolding rule which is not depth-preserving to obtain 
the optimal specialization. As we discuss above, we cannot ensure accuracy results 
(though we still have correctness) in these cases but in turn the use of local unfold- 
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ing will improve the efficiency of the partial deduction process, as our experimental 
results will show later. 

5 Assertion-based Unfolding for External Predicates 

Most of real-life Prolog programs use predicates which are not defined in the pro- 
gram (module) being developed. We will refer to such predicates as external. Ex- 
amples of external predicates are (1) the traditional "built-in" predicates such as 
arithmetic operations (e.g., is/2, <, =<, etc.) and basic input/output facilities; (2) 
those predicates defined in a different module, (3) predicates written in another 
language, etc. This section deals with the difficulties that such external predicates 
pose during partial deduction and extends our ASLD semantics to deal with them. 

5. 1 The notion of evaluable atom 

When an atom A, such that pred(A) = p/n is an external predicate, is selected 
during partial deduction, it is not possible to apply the derive rule in Definition [2] 
due to several reasons. First, we may not have the code defining p/n and, even 
if we have it, the derivation step may introduce in the residual program calls to 
predicates which are private to the module M where p/n is defined. In spite of 
this, if the executable code for the external predicate p/n is available, and under 
certain conditions, it can be possible to fully evaluate calls to external predicates at 
specialization time. We use Exec(S f ys, M, A) to denote the execution of atom A on 
a logic programming system Sys (e.g., Ciao or SICStus) in which the module M, 
where the external predicate p/n is defined, has been loaded. In the case of logic 
programs, Exec(Sys, M, A) can return zero, one, or several computed answers for 
M U A and then execution can either terminate or loop. We will use substitution 
sequences (|Le Charlier et al. 2002[) to represent the outcome of the execution of 
external predicates. A substitution sequence is either a finite sequence of the form 
... , n ), n > 0, or an incomplete sequence of the form (9\, . . . , 6 n , _L), n > 0, 
or an infinite sequence (6*i, . . . , 9i, . . .), i £ JEV*, where IN* is the set of positive 
natural numbers and _L indicates that the execution loops. We say that an execution 
universally terminates if Exec(Sys, M, A) — (9±, . . . , 9 n ), n > 0. 

In addition to producing substitution sequences, it can be the case that the 
execution of atoms for (external) predicates produces other outcomes such as side- 
effects, errors, and exceptions. Note that this precludes the evaluation of such atoms 
to be performed at partial evaluation time, since those effects need to be performed 
at run-time. A clear example of this are input/output facilities. In order to capture 
the requirements which allow executing external predicates at partial deduction 
time we now introduce the notion of evaluable atom: 

Definition 15 (evaluable) 

Let A be an atom such that pred(A) = p/n is an external predicate defined in 
module M. We say that A is evaluable in a logic programming system Sys if 
Exec(Sys, M, A) satisfies the following conditions: 



Efficient Local Unfolding with Ancestor Stacks 



19 



1. it universally terminates 

2. it does not produce side-effects 

3. it does not issue errors 

4. it does not generate exceptions 

We also say that an expression E is evaluable if 1) E is an evaluable atom, or 2) 
E is a conjunction of evaluable expressions, or 3) E is a disjunction of evaluable 
expressions. 

Clearly, some of the above properties are not computable (e.g., termination is un- 
decidable in the general case). However, it is often possible to determine some 
sufficient conditions (SC) which are decidable and ensure that, if an atom A sat- 
isfies such conditions, then A is evaluable. Intuitively, a sufficient condition can be 
thought of as a traditional precondition which ensures a certain behavior of the 
execution of a procedure provided they are satisfied. Then, if this process is applied 
to a call corresponding to an external predicate which is selected during partial 
deduction, then that call can be executed directly at partial deduction time. To 
formalize this, we propose to use the notion of evaluable assertion. Basically, an 
evaluable assertion is a pair containing a predicate descriptor and the sufficient 
conditions for its instances to be evaluable. 

Definition 16 {correct evaluable assertion) 

Let p/n be an external predicate defined in module M. An evaluable assertion 
(p(Xl, Xn), SC) is correct for predicate p/n in a logic programming system Sys 
if, V0: 

• the expression 0(SC) is evaluable, and 

• if Exec(5ys, M, 0(SC)) = (id) then 8(p(Xl, ...,Xn)) is evaluable. 

In principle, assertions have to be provided manually by the supplier of the (ex- 
ternal) code. However, for predicates that are defined in the source language and 
use only external predicates for which those assertions are available, existing anal- 
ysis tools (like those within the CiaoPP systerr@) are able to infer them in many 
practical cases (see (| Albert et al. 2006]) ). as we will discuss later. 

One of the advantages of using this kind of assertion is that it makes it possible to 
deal with new external predicates (e.g., written in other languages) in user programs 
or in the system libraries without having to modify the partial evaluator itself. Also, 
the fact that the assertions are co-located with the actual code defining the external 
predicate, i.e., in the module M (as opposed to being in a large table inside the 
partial deduction system) makes it more difficult for the assertion to be left out of 
sync when a modification is made to the external predicate. We believe this to be 
very important to the maintainability of a real application or system library. 



4 In this system, evaluable assertions are called eval assertions. 
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Example 2 

Let us consider the following assertion for the builtin predicate <: 

(A =< B, (arithexpr(A), arithexpr(B))) 

which states that if predicate =</2 is called with both arguments instantiated to 
a term of type arithexpr, then the call is evaluable in the sense of Definition 1151 
In our implementation, we use the 11 computational assertions" which are part of 
the assertion language (IPuebla et al. 2000]) of CiaoPP, the Ciao system preproces- 
sor (iHcrmcnegild o~et al. 2005] ), in order to declare evaluable assertions. 

The type arithexpr corresponds to arithmetic expressions which, as expected, 
are built out of numbers and the usual arithmetic operators. In our implementation 
in Ciao, the type arithexpr is expressed as a unary regular logic program. This 
allows using the underlying Ciao system in order to effectively decide whether a 
term is an arithexpr or not. 

5.2 The extension of ASLD resolution 

The following definition extends our ASLD semantics by providing a new rule, 
external-derive, for evaluating calls to external predicates. Given a sequence of 
substitutions (Ox, . . . , 6>„), we define Subst({6x, . . . , n )) = {6x, . . . , Q n }. 

Definition 17 (external- derive) 

Let Sys be a logic programming system. Let G = <— Ax, ■■■ , A R , . . . , Ak be a goal. 
Let S = (G I AS) be a state and AS a stack. Let TZ be a computation rule such 
that TZ(G) —An with pred(A R ) = p/n an external predicate from module M. Let 
C be an evaluable assertion (p(Xl, Xn), SC). Then, S' = (G' I AS') is external- 
derived from S and C via TZ in Sys if: 

a = mgu(A R ,p(Xl, ...,Xn)) 
Exec(Sys, M, a(SC)) = (id) 
6 e Subst(Exec(Sys,M, A R )) 
G' is the goal 6 (Ax, ■ ■ . ,A R ^ 1} A R+1 , ...,A k ) 

AS' = AS 

Notice that, since after computing Exec(Sys, M, A R ) the computation of A R is fin- 
ished, there is no need to push (a copy of) A R into AS and the ancestor stack is 
not modified by the external-derive rule. This rule can be nondeterministic if the 
substitution sequence for the selected atom A R contains more than one element, 
i.e., the execution of external predicates is not restricted to atoms which are deter- 
ministic. The fact that A R is evaluable implies universal termination. This in turn 
guarantees that in any ASLD tree, given a node S in which an external atom has 
been selected for further resolution, only a finite number of descendants exist for S 
and they can be obtained in finite time. 

Example 3 
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:- module (main_prog, [main/2] , [] ) . 

:- use_module(comp, [long_comp/2] , [] ) . 



main(X,Y) 



problem (X, Y) , q(X) . 



problem (a, Y) : - ground (Y) , long_comp(c , Y) . 
problem(b.Y) :- ground(Y) ,long_comp(d,Y) . 



q(a). 



mam_prog 



comp 



term_typing 



Fig. 5. Motivating Example 



Consider the Ciao system with the assertion in Example [2] for 1=<1. Consider also 
the atoms 5 and 7, which are of the form 1=<1, in the ASLD derivation of Figure[2] 
Both atoms can be evaluated because 

Exec(ciao, arithmetic, (arithexpr{l) , arithexpr{l))) — (id) 

This is a sufficient condition for Exec(czao, arithmetic, (1 =< 1)) to be evaluable. 
Its execution returns Exec(ciao, arithmetic, (1 =< 1)) = (id). 

In addition to the conditions discussed above which allow evaluating atoms for 
external predicates at specialization time, an orthogonal issue is that of the cor- 
rectness of non-leftmost unfolding in the presence of external predicates. For logic 
programs without impure predicates, non-leftmost unfolding is sound thanks to the 
independence of the computation rule (see for example ( [Lloyd 1987| )H Unfortu- 
nately, non-leftmost unfolding poses several problems in the context of full Prolog 
programs with impure predicates, where such independence does not hold anymore. 
For instance, ground/ 1 is an impure predicate since, under LD resolution, the goal 
ground(X) ,X=a fails whereas X=a,ground(X) succeeds with computed answer X/a. 
Those executions are not equivalent and, thus, the independence of the computa- 
tion rule does no longer hold. As a result, given the goal <— ground (X) ,X=a, if 
we allow the non-leftmost unfolding step which binds the variable X in the call to 
ground (X) , the goal will succeed at specialization time, whereas the initial goal fails 
in LD resolution at run-time. The above problem was early detected (|Sahlin 1993[) 
and it is known as the problem of backpropagation of bindings. Also backpropaga- 
tion of failure is problematic in the presence of impure predicates. For instance, <— 
write (hello) ,fail behaves differently from <— fail , write (hello) . 

In order to illustrate the problem, consider the Ciao program in Fig. [5) which 
uses the impure predicate ground/ 1 and whose modular structure appears to the 
right. term_typing is the name of the module in Ciao where ground/1 is defined 
and predicate long_comp/2 is imported from the user module comp. Consider a de- 
terministic unfolding rule and the entry declaration: " : - entry main (X , a) . " . The 



However, non-deterministic unfolding of non-leftmost atoms can degrade efficiency. 
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unfolding rule performs an initial step and derives the goal problem(X,a) ,q(X). 
Then, it cannot select the leftmost atom problem (X, a) because its execution per- 
forms a non deterministic step. In this situation, different decisions can be taken, 
a) We can stop unfolding at this point. However, in general, it may be profitable 
to unfold atoms other than the leftmost. Interesting computation rules are able to 
detect the above circumstances and "jump over" the problematic atom in order to 
proceed with the specialization of another atom (in this case q(X)). We can then 
decide to b) unfold q(X) but avoiding backpropagating bindings or failure onto 
problem (X, a). And the final possibility c) is to unfold q(X) while allowing back- 
propagation onto problem(X,a). However, this will require that some additional 
requirements hold on the atom(s) to the left of the selected one. 

There are several solutions in the literature (see, e.g.. (|Leuschel 1994)IEtalle et al. 1997} 
lAlbert et al. 2002| |Leuschel and Bruynooghe 2002| ILeuschel et al. 2004P ) which al- 
low unfolding non-leftmost atoms by avoiding the backpropagation of bindings and 
failure, i.e., in the spirit of possibility b). Basically the common idea is to represent 
explicitly the bindings by using unification IjLeuschel 1994) or residual case expres- 
sions ([Albert et al. 2002]) rather than backpropagating them (and thus applying 
them onto leftmost atoms). For our example, by using unification, we can unfold 
q(X) and obtain the resultant main(X,a) : -problem(X,a) ,X=a. This guarantees 
that the resulting program is correct, but it definitely introduces some inaccuracy, 
since bindings (and failure) generated during unfolding of non-leftmost atoms are 
hidden from atoms to the left of the selected one. The relevant point to note is that 
preventing backpropagation, by using one of the existing methods, can be a bad 
idea for at least the following reasons: 

1. Backpropagation of bindings and failure can lead to an early detection of fail- 
ure, which may result in important speedups. For instance, if we allow back- 
propagating the binding X=a to the left atom, we get rid of the whole (failing) 
computation for problem(b,a) in the residual program. 

2. Backpropagation of bindings can make the profitability criterion for the left- 
most atom to hold, which may result in more aggressive unfolding. In the 
example, by backpropagating, we obtain the atom problem(a,a) which al- 
lows a deterministic computation rule to proceed to its unfolding. 

3. Backpropagation of bindings may allow better indexing by further instantiat- 
ing arguments in clause heads. This is often good from a performance point of 
view (see, e.g., (jVenken an d Dcmoc n 1988|) V In our example, we will obtain 
the clause head main (a, a) with better indexing than main (X, a). 

The bottom line is that backpropagation should be avoided only when it is really 
necessary since interesting specializations can no longer be achieved when it is 
disabled. 

The problems involved in and some possible solutions to non-leftmost unfolding 
can be found in the literature (|Leuschel 1994) lEtalle et al. 1997| lAlbert et al. 2002| 
Lcuschel and Bruynooghe 2002). However, there is still ample room for improve- 
ments. In particular, the intensive use of static analysis techniques in this assertion- 
based context seems particularly promising. We are investigating the use of the 
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:— mod u le ( _ , [ p / 2 ] ) . 

:— use_package(library(assertions)). 

p(Y,L):- findall(X, property (X) ,L) , \+ r(Y). 

:— trust pred property/1 + (eval,sideff(free)). 
property(X):- q(X), \+ r(X). 

q(a). q(b). 

:— trust pred r/l+(eval,sideff(free)). 
r(b). 

Fig. 6. Program with meta-calls 

analyzers available in CiaoPP with this aim, though this is outside the scope of this 
article. 

5. 3 Handling of meta-predicates 

Though not introduced in the formalization for simplicity, our partial evaluator 
can handle the usual Prolog meta-predicates, such as call/1, f indall/3, bagof /3, 
and setof/3. Meta-predicates are characterized by receiving one or more atoms 
as input. For example, call/1 receives an atom as its only input and f indall/3 
receives a goal in its second argument position. The simplest possible handling 
of meta-predicates consists in residualizing all meta-calls, i.e., all calls to meta- 
predicates, and transferring the atoms which appear as arguments in such meta- 
calls to the global control for their subsequent partial evaluation. For this, all meta- 
predicates must be declared as such and the arguments which contain atoms must 
be known in advance. In the case of Ciao this is done using assertions. 

As a further optimization, when the atoms which appear in meta-calls are evalu- 
able, then rather than residualizing the meta-call, our partial evaluator evaluates 
both the atom itself and also the call to the meta predicate. This is an important 
optimization because partial evaluation loses a lot of precision when unfolding is 
stopped and atoms are transferred to the global control. 

Another important feature of Prolog programs is negation as failure, i.e., the 
\+/ 1 meta predicate. In order to preserve the semantics of negation as failure, 
evaluation of a meta-call of the form \+ A requires A to be ground. Therefore, at 
partial evaluation time a meta-call \+ A is only evaluated if both A is evaluable and 
ground. If this is not the case, the meta-call is residualized and A is transfered to the 
global control. This allows a relatively simple handling of negation as failure where 
\+/l is considered as a meta predicate with the additional evaluation requirement 
that its associated atom is ground. 

Example 4 

Figure [6] shows an example Ciao program containing calls to the f indall/3 meta- 
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:— module(_, [p/2] ). 

:— use_package( library(assertions )). 
p(A,[a]) :- \+r.l(A) . 

:— trust pred r_l(_l) + (eval , s i d e f f ( f r e e ) ) . 
r.l(b). 

Fig. 7. Partially evaluated program with meta-calls 

predicate and negation as failure. The trust assertions in Ciao syntax inform the 
partial evaluator that all calls to the property/ 1 and r/1 predicates are evaluable. 

As a result, the f indall(X, property (X) ,L) meta-call is evaluable and can be 
replaced by the unification L=[a]. However, the second meta-call, i.e., \+ r(Y) is 
residualized since Y remains a variable at partial evaluation time. The resulting 
program obtained by our partial evaluator is shown in Figure [7] Since partially 
evaluated atoms are renamed, the specialized version of r(Y) has been renamed to 
r_l (Y) . The atom p(A,B) keeps its original name since it is an exported predicate, 
in order to preserve the module interface. 

6 Experimental Results 

We have implemented in our partial evaluation system the unfolding rule we pro- 
pose, together with other variations in order to evaluate the efficiency of our pro- 
posal. Our partial evaluation system has been integrated in a practical state of the 
art compiler which uses global analysis extensively: the Ciao compiler and, specif- 
ically, its preprocessor CiaoPP ( |Herm cncgildo ~et al. 2005"] ). For the tests, the whole 
system has been compiled using Ciao 1.13 ([Bueno et al. 2009]) . All of our experi- 
ments have been performed on an Intel Core 2 Quad Q9300 at 2.5GHz with 1.95GB 
of RAM, running Linux 2.6.28-15. 

The programs used as benchmarks are indicated in the Bench column. They 
are classical programs often used as benchmarks for analysis and partial evaluation 
of logic programs. They are described in more detail below. Since our proposal 
improves the performance of the unfolding process, i.e., the local control, we have 
chosen as benchmarks programs whose partial evaluation performs plenty of unfold- 
ing, since this allows observing the benefits of our proposal better. In particular, 
three of the benchmarks considered: advisor3, query, and zebra can be fully un- 
folded using homeomorphic embedding with ancestors. In the rest of the programs 
we provide initial queries which are partially instantiated in order to show that our 
partial evaluation system also includes global control and can partially evaluate pro- 
grams whose input data is not fully instantiated. Our global control is also based 
on homeomorphic embedding. When a new atom is going to be specialized, we first 
check whether it embeds any of the previously specialized atoms. In that case, the 
new atom is generalized before being specialized by using the most specific general- 
ization of the new and the embedded atom. Otherwise, the new atom is specialized 
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Relation 


Tree 


Stack 


Rel/Stack 


Tree/Stack 


Bench 


G 


L 


G 


L 


G 


L 


T 


L 


T 


L 


advisor3 





103 





183 





151 


0.68 


0.68 


1.21 


1.21 


nrev_80 


CO 


CO 


38 


50622 


46 


7985 


DC 


CO 


6.31 


6.34 


nrev_43 


12 


912 


10 


2804 


13 


774 


1.17 


1.18 


3.58 


3.62 


permute6 


4 


526 


4 


651 


9 


453 


1.15 


1.16 


1.42 


1.44 


query 





92 





102 





86 


1.07 


1.07 


1.19 


1.19 


qsort_80 


CO 


CO 


15571 


430485 


15582 


47923 


CO 


CO 


7.02 


8.98 


qsort_23 


222 


797 


229 


1615 


213 


566 


1.31 


1.41 


2.37 


2.85 


rev_80 


3 


555 


2 


581 


2 


547 


1.02 


1.01 


1.06 


1.06 


zebra 





1043 





1682 





1052 


0.99 


0.99 


1.60 


1.60 



Table 1. Performance of Ancestor Stacks in Terms of Execution Time 

as is. For our experiments, we use as input lists whose first part is instantiated to 
integers and then the rest of the list is unknown, i.e., just a variable, at partial 
evaluation time. In the tables, we add to the name of the benchmark the number of 
elements in the input list which are instantiated. For example, nrev_80 should be 
interpreted as the well-known naive reverse program together with a query which 
has as input a list of the form [1, . . . , 80 |T], with T a free variable. 

The advisor3 program is a variation of the advisor program in the DPPD (jLeuschel 2002a|) 
library. The query and zebra programs are classical benchmarks for program anal- 
ysis. In particular, query performs a query to a small Prolog database and zebra 
implements a simple logical puzzle. Program qsort corresponds to the quick-sort 
program shown in the article. The part of the list which is instantiated is not or- 
dered. The rev benchmark is another list reversal program, but now with linear 
complexity, using an accumulator. Finally, permute is a permutation program which 
uses a nondeterministic deletion predicate. Note that two of the programs (nrev 
and qsort) are partially evaluated w.r.t. two different input lists. The smaller of the 
two corresponds to the largest possible partially instantiated list that the partial 
evaluator can handle using the Relation implementation explained below, without 
running out of memory. Importantly, none of advisor3, query, nor zebra can be 
fully unfolded using homeomorphic embedding over the full sequence of selected 
atoms. Also, nrev and, as seen in the running example, qsort are potentially not 
fully unfolded if the input lists contain repetitions unless ancestors are considered. 
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In the next two tables, we compare three different implementations of unfolding 
based on homcomorphic embedding with ancestors: 

Relation We refer to an implementation where each atom in the resolvent is an- 
notated with the list of atoms which are in its ancestor relation, as done in the 
example in Figure 

Trees This column refers to the implementation where the ancestor relations of 
the different atoms are organized in a proof tree. 

Stacks The column Stacks refers to our proposed implementation based on an- 
cestor stacks. 

6.1 Execution times 

Let us explain the results in Table [TJ Times are in milliseconds, measuring runtime, 
and are computed as the arithmetic mean of five runs. The partial evaluation time 
in each implementation is split into two columns. The first one, labeled G, shows 
the time taken by global control. The second one, labeled L, shows the time taken 
by local control (i.e., unfolding). The benchmarks nrev_80 and qsort_80 contain 
the value oo instead of a number in the G and L columns for Relation to indi- 
cate that the partial evaluation system has run out of memory. For each of these 
two benchmarks, we have repeated the experiment with the largest possible initial 
query that Relation can handle in our system before running out of memory, i.e., 
nrev_43 and qsort_23. Relation is quite efficient in time for those benchmarks 
it can handle, though a bit slower than the one based on stacks. However, and as 
can be seen in Table [3J its memory consumption is extremely high, which makes 
this implementation inadmissible in practice. Regarding Trees, this implementa- 
tion, based on proof trees, has good memory consumption but it is significantly 
slower than Relation due to the overhead of traversing the tree for retrieving the 
ancestors of each atom. 

The last four columns compare the relative specialization times of Relation and 
Trees w.r.t. the Stacks algorithm. It should be observed that these three alterna- 
tives are different implementations of the same local control strategy, and that the 
same global control strategy is used in all three cases. Therefore, exactly the same 
residual programs are obtained in the three cases. As the table shows (with values 
greater than one), Stacks is faster than Trees in all cases. Furthermore, Stacks is 
even faster than the implementation based on explicitly storing all ancestors of all 
atoms (Relation) for most programs, while having a memory consumption com- 
parable to (and in fact, slightly better than) the implementation based on proof 
trees. Two speedups are shown per implementation. One, named L, only considers 
the time required for local control, and the other one, named T, considers the total 
time of global plus local control. The actual speedups w.r.t. Trees range from 1.06 
in the case of rev_80 to 8.98 L (7.02 T) in the case of qsort_80. This variation is 
due to the different shapes which the proof trees can have for the (derivations in 
the) SLD tree. In the case of rev, the speedup is low since the SLD tree consists of 
a single derivation whose proof tree has a single branch. Thus, in this case consider- 
ing the ancestor sequence is indeed equivalent to considering the whole sequence of 
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Relative Memory Reduction 


Bench 


Relation 


Trees 


Stacks 


Relation 


Trees 


advisor3 


1667260 


850612 


751112 


2.22 


1.13 


nrev_80 


mem 


1076384 


944936 


oo 


1.14 


nrev_43 


56255068 


1103980 


1041490 


54.01 


1.06 


permute_6 


23361920 


1959004 


1431976 


16.31 


1.37 


query 


2764368 


8064 


7520 


367.60 


1.07 


qsort_80 


mem 


5660460 


5038540 


oo 


1.12 


qsort_23 


11130584 


630048 


598212 


18.61 


1.05 


rev-80 


2552524 


144264 


139076 


18.35 


1.04 


zebra 


26819712 


107760 


101280 


264.81 


1.06 


Overall 








oo 


1.15 



Table 2. Performance of Ancestor Stacks in terms of Memory Consumption 



selected atoms. But note that this only happens for binary clauses. It is also worth 
noticing that the speedup achieved by the Stacks implementation increases with 
the size of the SLD tree, as can be seen in the two benchmarks which have been 
specialized w.r.t. different queries. The overall resulting speedup of our proposed 
unfolding rule over other existing ones is significant: over 8 times faster than our 
tree-based implementation. 

6.2 Memory consumption 

We have also studied the memory required by the unfolding process. Let us briefly 
discuss the figures depicted in Table[2]which represent, in number of bytes, memory 
consumption. It has been measured at each derivation step during the construction 
of the ASLD trees. At each step, the resulting numbers for all memory areas (stack, 
heap, etc.) have been added and then compared to the previous maximum value, 
taking always the larger of the two, thus computing the high water mark, i.e., the 
maximum memory required to perform unfolding. The figures show, for each bench- 
mark, the high water mark minus the memory already in use when the construction 
of the SLD tree was started. In order to make these numbers closer to the actual 
memory used, garbage collection has remained enabled during the different exper- 
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iments. In order to make memory figures comparable, we force garbage collection 
just before starting partial evaluation of each benchmark. 

In the last row, labeled Overall, we summarize the results for the different bench- 
marks using a weighted mean, which places more importance on those benchmarks 
with relatively larger unfolding figures. We use as weight for each program its actual 
unfolding time/memory We believe that this weighted mean is more informative 
than the arithmetic mean, as, for example, doubling the speed in which a large 
unfolding tree is computed is more relevant than achieving this for small trees. 

As Table [2] shows, the Stacks algorithm presents lower consumption than cither 
of the two other algorithms studied for any of the programs. It can be seen that the 
amount of memory required by the Relation algorithm precludes it from its prac- 
tical usage. Regarding the Stacks algorithm, not only it is significantly faster than 
the implementation based on trees. Also it provides a relatively important reduc- 
tion (1.15 overall, computed again using a weighted mean) in memory consumption 
over Trees, which already has a good memory usage. 

Altogether, when the results of Tableland Table [2] are combined, they provide 
evidence that our proposed techniques allow significant speedups while at the same 
time requiring somewhat less memory than tree based implementations and much 
better memory consumptions than implementations where the ancestor relation is 
directly computed. This suggests that our techniques are indeed effective and can 
contribute to making partial evaluation a practical tool. 

6.3 Comparison with Ecce. Specialization Quality. 

Finally, in Tabled we want to compare our implementation with that of a state-of- 
the-art partial evaluator and see the quality of the specialized programs. To do so, 
we have also measured the time that it takes to process the same benchmarks using 
Leuschel's Ecce ([Leuschel 2002a[) system. For this, we have used the compiled ver- 
sion available at http://www.stups.uni-duesseldorf.de/~asap/asap-online-demo/meccedownloads 
and run the experiments on the same machine. These execution times are provided 
in columns Ecce/, and Ecce^ which show, respectively, the time taken by local 
and the global control in Ecce. When compared with L and G in Table [T] for the 
stack implementation, the results provide evidence that our proposed stack-based 
implementation compares quite well with state of the art systems as regards special- 
ization times. Indeed, the specialization times using our stack-based implementation 
are considerably smaller for all benchmarks with high local control times. In those 
benchmarks in which Ecce is faster than the Stacks implementation, it is due to the 
unfolding rules not being identical which results in Ecce performing fewer unfolding 
steps. Note that performing less unfolding may lead to less specialized programs, 
which are often less efficient. 

The next columns aim at evaluating the quality of the specialized programs in 
Ecce and in our system by comparing their runtimes with those of the original 
programs. We have chosen sufficiently large input data and run the original pro- 
gram (column Orig), the specialized one by our system (column Stacks) and the 
specialized one by Ecce (column Ecce) on the same data and the same number of 
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nrev_43 
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105600 
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41 
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permute_6 


40 


20 
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3.10 


1.51 


2.06 


query 


20 


90 


1106 


55 


570 


20.11 


1.94 


10.32 


qsort_80 


85300 


269070 


1178 


15 


17 


78.53 


70.14 


1.11 


qsortJ23 


260 


900 


978 


34 


34 


29.12 


29.12 


1.00 


rev_80 


10 
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1132 


712 


704 


1.59 


1.61 


0.99 


zebra 


170 


300 


6 


1069 


384.90 


2.30 


167.00 





Table 3. Comparison with Ecce. Specialization Quality. 



times and show the aggregated runtime. The last three columns show the speedup 
achieved for each benchmark. In particular, O/S and O/E show, respectively, the 
speedup of Stacks and Ecce w.r.t. the original program and E/S compares Ecce 
against Stacks. It should be observed that in all cases the specialized programs in 
both Ecce and Stacks are more efficient than the original ones and in most cases the 
gain is significant. The cases in which Stacks performs better than Ecce (e.g., query 
and zebra) are because we can fully unfold them in Stacks while Ecce stops the 
specialization earlier. Hence, the gain is much larger. It is also important to notice 
that in the example advisor, the specialization obtained by Stacks also performs 
more unfolding steps than the one in Ecce. In this case, such additional unfolding 
results in an unneeded over-specialization which increases the size of the residual 
program and leads to a less efficient execution. 



7 Related Work and Conclusions 

The development of powerful unfolding rules has received considerable attention 



during the last years (Leuschel and Bruynooghe 2002). The most successful tech 



niques to date are based on two fundamental ingredients: 

• the use of a wqo which can be used to guarantee termination while achieving 
very powerful unfoldings, 

• structuring the atoms already visited in each derivation in a tree rather than 
using an unstructured collection, such as a set. 
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Among the well-quasi orderings, the homeomorphic embedding (|Krusk al 1960; Leuschel and Bruynooghe 2002 ) 
has proved to be very powerful in practice. Regarding the structure to use for vis- 
ited atoms, the notion of ancestors seems to be the best one since it guarantees 
termination while allowing transformations which are strictly more powerful than 
those achievable if unstructured collections are used. 

The use of ancestors for refining sequences of visited atoms was proposed early on 
by ( |Bruynooghe et al. 1992 ) and significant effort has been devoted to improve the 
implementation of ancestors (Martens and De Schreye 1996). However, the combi- 
nation of wqo and ancestors happens to be very inefficient in practice. This is mainly 
due to the fact that dependency information has to be maintained for the individ- 
ual atoms in each derivation. In principle, the use of ancestors should not only 
allow more powerful transformation but also speed up unfolding since it reduces 
the length of sequences for which admissibility has to be checked. Unfortunately, 
maintaining such information about ancestors during the generation of SLD trees 
introduces a costly overhead which can eliminate the theoretical efficiency gains. 

In this work we have proposed ASLD resolution, a novel extension over the SLD 
semantics to incorporate ancestor stacks which can be used as a basis for the efficient 
generation of (incomplete) SLD trees during partial deduction in combination with 
wqo. The main features of the implementation technique and extensions that we 
propose for the ancestor-based local unfolding rule, based on ASLD resolution, are: 
(1) it is parametric w.r.t. the wqo of interest; (2) it can handle logic programs 
with builtins; (3) it is guaranteed to always provide finite trees; (4) it is very easy 
to implement since the ancestor information is simply stored using a stack; (5) 
it provides a very efficient implementation of ancestor information; (6) if certain 
conditions are imposed on the computation rule, then it is as accurate as standard 
(more inefficient) unfolding rules based on ancestors. Note that, as it is the case 
with unfolding rules based on traditional SLD resolution, our semantics can be 
used in combination with a determinacy check which may decide to stop unfolding 
even if termination is guaranteed whenever too many alternative, non-deterministic, 
branches are generated in the SLD tree. 

The unfolding rule proposed in this work has been implemented in the CiaoPP 
system (Hermenegildo et al. 20051, the preprocessor of the Ciao programming lan- 
guage. Experimental results are promising: they provide evidence that our pro- 
posed techniques allow significant speedups while at the same time requiring some- 
what less memory than tree-based implementations and much better memory con- 
sumptions than implementations where the ancestor relation is directly computed. 
Though specialization time is obviously not as critical as execution time, being able 
to perform powerful specializations in reasonable time can only contribute to the 
practical takeup of partial deduction techniques. 

As for future work, we plan to incorporate in our partial evaluator (embed- 
ded in CiaoPP) the extensions needed to perform Conjunctive Partial Deduction 
and to investigate whether local unfolding can be successfully used in this con- 
text. We are also investigating additional solutions for the problems involved in 
non-leftmost unfolding for programs with extra logical predicates beyond those 
presented in the literature (jLeuschel 1994) lEtalle et al. 1997| lAlbert et al. 2002| 
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Leuschel and Bruynooghe 2002). In particular, the intensive use of static analy- 



sis techniques in this context seems particularly promising. In our case we can take 
advantage of the fact that our partial deduction system is integrated in CiaoPP, 
which includes extensive program analysis facilities. A first step in this direction has 
been taken in (Alb ert et al. 2006]) by using backwards analysis to infer purity as- 
sertions which determine when a non-leftmost step is safe in the presence of impure 
predicates. 
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